A researcher's guide to the comparative assessment of vocal production learning

Vocal production learning (VPL) is the capacity to learn to produce new vocalizations, which is a rare ability in the animal kingdom and thus far has only been identified in a handful of mammalian taxa and three groups of birds. Over the last few decades, approaches to the demonstration of VPL have varied among taxa, sound production systems and functions. These discrepancies strongly impede direct comparisons between studies. In the light of the growing number of experimental studies reporting VPL, the need for comparability is becoming more and more pressing. The comparative evaluation of VPL across studies would be facilitated by unified and generalized reporting standards, which would allow a better positioning of species on any proposed VPL continuum. In this paper, we specifically highlight five factors influencing the comparability of VPL assessments: (i) comparison to an acoustic baseline, (ii) comprehensive reporting of acoustic parameters, (iii) extended reporting of training conditions and durations, (iv) investigating VPL function via behavioural, perception-based experiments and (v) validation of findings on a neuronal level. These guidelines emphasize the importance of comparability between studies in order to unify the field of vocal learning. This article is part of the theme issue ‘Vocal learning in animals and humans’.


The need for comparability and unity across reports of vocal production learning
Vocal production learning (VPL) has been described as 'instances where [acoustic] signals themselves are modified in form as a result of experience with those of other individuals' [1]. The definition of this complex behavioural trait, which is thought to be one of the evolutionary prerequisites for the human capacity for speech, has been changed repeatedly and redefined in a number of studies. Among others, VPL has been described as 'matching', 'imitating', 'copying', 'reproducing', 'resembling' and 'vocally mimicking' conspecific, heterospecific or artificially generated acoustic signals. The fickle and varied nature of these descriptions is based in part on the diversity of its expression and, additionally, on the heterogeneity of its measurements. As the delimitation of VPL is in a phase of redefinition, as evidenced by this special issue, the evidence we are looking at in order to inform these definitions should at least be comparable, reliable and uniform. However, in different experimental studies describing VPL, the same terminology is often used to describe sometimes drastically different findings. In order to compare findings of studies investigating multidimensional behavioural traits such as VPL, not only within and between species, but also independent of the definition used at the time of the study, it is of utmost importance to make the reporting of the findings as comparable and explicit as possible. In this paper, we highlight the importance of such comparability between studies and provide guidelines for the unified reporting of VPL evidence. The implementation of these guidelines will facilitate the organization of vocal learners within the vocal learning parameter space and allow the comparative assessment of VPL to adapt flexibly according to the definition used. We want to give two examples of the difficulty of the comparative assessment of VPL. The first example illustrates the discrepancies in the demonstration of VPL via the imitation of human speech. Examples for the imitation of human speech include studies in elephants [2], seals [3], songbirds [4], parrots [5] and cetaceans (belugas [6,7] and killer whales [8]). Some of these studies include the results of months of extensive training, while others report spontaneous mimicry. Assessments of the similarity between animal vocalization and human-produced acoustic targets vary strongly among these studies. A common evaluation strategy is to enlist human raters to either transcribe the recordings [2] or judge acoustic similarity between the recordings and the target sound [6,7]. Furthermore, it is typical to assess the similarity between tutor and tutee vocalizations based solely on visual inspection of spectrograms [5,7]. However, repeatable acoustic parameter extraction and comparison, such as discriminant functional analyses or distance matrices based on specific extracted parameters [2,3], are often lacking.
The second example illustrates the difficulty of comparing VPL studies within animal clades. The capacity for VPL in bats has thus far been indicated for a handful of species [9]. The evidence for this capacity, however, is again provided by a variety of different study designs and reported parameters. While some studies used experimental designs to specifically modify social group structure (such as isolation studies [10] or transfer studies [11]), other studies focused on vocal adjustment in response to a playback [12][13][14][15] or on recordings in the wild [16][17][18]. These studies also vary in the main parameters investigated to assess VPL in bats. While some studies focus on the fundamental frequency [10,15,16], others focus on bandwidth [11,12], or spectral centroid frequency [14], or used discriminant function analyses to assess a number of parameters in combination [18].
These two approaches to the demonstration of VPL (i.e. human speech mimicry in a variety of taxa and studying different expressions of VPL in one taxon) have been conducted with considerable variation in both study design and reported parameters. While all of these studies claim the demonstration of VPL, the presented evidence has varied in its success at convincing the scientific community. This scepticism is rooted in the inability to compare the evidence against one another. As applying different approaches to the demonstration of VPL is, of course, crucial for such a diverse field of study, the lack of comparability is a key obstacle in the field of VPL research. To allow the inter-and intraspecific comparison of VPL capacity in the future, measured and reported parameters need to be comparable across a wide array of studies.
The assessment of VPL capacity can often be reduced to a test of similarity between tutee and tutor (conspecific, heterospecific or artificial) and in the long run also to the judgement of qualities such as novelty and complexity of the observed vocal imitations. Our assessment of either vocal imitation of single acoustic parameters or the sum of all parameters is often concerned with the question of whether the individual or species has the capacity to learn (to imitate) an acoustic signal precisely (VPL quality). But what do VPL studies mean and report when the precision or quality of imitation is described? In experimental studies, the VPL trait is often evaluated on the basis of comparisons within an acoustic parameter space. Especially for zebra finches, the currently most studied VPL model species [19], several different automatic algorithms have been developed to assess the similarity of their calls or songs [20,21]. However, these are often quite specific for their focal species and dependent on laboratory recording conditions. In the wild, VPL is often more subtle and harder to demonstrate due to the lack of controlled recording conditions. Most importantly, in the wild, not only the change within the acoustic parameter space is essential, but also the behavioural response to and the social reinforcement of the trait are important for the comprehensive assessment of VPL. Therefore, we focus here on two types of demonstration of VPL: the bioacoustic, analytical evaluation and the behavioural or neuronal evaluation of acoustic differences/similarities. The evaluation of VPL can, thus, be twofold and concern either the acoustic parameter space and/or the behavioural decision-making/ perceptual space. Both spaces can be modified due to VPL and both can be assessed when VPL quality is studied. In the following, we present guidelines for reported acoustic parameters and additional studies, which help to comprehensively describe a species' capacity for VPL, assess possibilities for external validation, and ultimately make the findings available for cross-species comparisons of VPL.
2. The acoustic parameter space (a) The need for a robust baseline in order to assess vocal production learning precision (and novelty) For the assessment of VPL, a species' typical acoustic variation needs to be considered as baseline. Only by considering the species-specific vocal repertoire and the inter-and intraindividual vocal variation can we discriminate learned and experience-independent vocalizations and, moreover, assess the placement of a 'new' vocalization within the species' distribution of the acoustic feature in focus. Judging novelty is one of the hardest tasks in the field of vocal learning; nevertheless, it is often considered one of the clearest distinctions between vocal usage and VPL [1,22]. But when is something novel and how much does the novel signal needs to vary from existing calls? These questions need to be answered considering the species-specific vocal repertoire. The variation of parameters in the acoustic environment and a subject's pre-exposure repertoire can give us an idea of the variability and the importance of different acoustic parameters (i.e. of behavioural relevance for the species). For example, if a 'novel vocalization' is defined as existing outside of the acoustic 'feature space' of the natural variety of the species, the species/population mean needs to be consulted as a ground truth. Knowledge about a species' typical vocal variation also has implications for experimental study design. The generation of artificial acoustic targets for imitation studies needs to be informed by the species' vocal baseline to avoid either overlapping with the pre-existing repertoire or exceeding the species' physiological limits of sound production.
The acquisition of such baseline data (i.e. vocal repertoires and intraspecies vocal variation) would ideally consist of recordings of all behavioural contexts, life-history events, social interactions and developmental stages from several individuals of both sexes, possibly from several geographical locations. Acquiring such complete repertoires is extremely challenging, but they can be convincingly approximated. Long-term acoustic recordings of wild and captive animals often reveal the complexity of the vocal repertoire and allow educated guesses of the comprehensiveness of the recordings (e.g. for birds [23,24] and for mammals [25,26]). Such baseline knowledge of the vocal repertoire is crucial for the demonstration that a recorded call is indeed novel and did not preexist in the animal's repertoire. The difficult evaluation of vocal novelty highlights not only the importance of clarifying which precise definition of VPL is applied but also the significance of a detailed reporting culture. The acquisition of reference data, such as call repertoires or baseline calls, should be a prerequisite for the assessment of the origin of newly arisen changes in vocal parameters.

(b) Proposed reporting of an acoustic parameter space
When suggesting a parameter space or a list of parameters to be reported, several things need to be said. Even though we want to stress the importance of comparability, there are always conditions under which the recording or reporting of parameters is not possible. For example, experimental conditions might prevent certain parameters from being recorded: artificial or natural background noise, constraints within the recording chain (limited sample rate, frequency range of the microphone/hydrophone, etc.), the acoustic character of the recording site (transmission loss, filtering characteristics, reverb), distance from the sound source, and the observability of the animal under investigation. Considering these disclaimers, acoustic parameters must be reported as comprehensively as possible. Furthermore, the more parameters are reported the better, as this enables cross-species comparisons, thereby highlighting their usefulness. Several papers and guides have been published in the past, focusing on bioacoustics recording and reporting standards [27,28].
This and comparable literature should be consulted before designing bioacoustics experiments in order to demonstrate VPL. Here, we list a number of acoustic parameters, which are often reported in studies investigating VPL and would facilitate the comparison between studies if reported comprehensively and throughout all VPL studies (table 1). Given the diversity of vocalizations and their modes of production, reaching from nearly pure tones over complex tonal or 'noisy' vocal structures and rhythms to clicks, it is important to have in mind that not all parameters are equally well suited to characterize every vocalization. However, the acoustic parameters listed in table 1 are well suited to give at least a simple description of most types of vocalizations.
The software and algorithms commonly used to assess differences in vocalization parameters are varied but should in the best case result in comparable outcomes. Acoustic analyses are regularly conducted with commercial or opensource software (e.g. PRAAT [32], Raven [33], Avisoft [34], Sound Analysis Pro [20], SIGNAL (Engineering Design, Berkeley, USA), Audacity ® [35], Lucinia [36]) or with selfwritten code in Matlab, Python, C++ or R. Some programmes provide toolboxes or packages, which already include perceptual algorithms, such as a built-in mel scale (e.g. the Matlab speech processing toolbox [37]). The drawback of these programs is that they are usually not ideal for large-scale batch processing of sound recordings. This is where clustering algorithms show their true capability; however, they require initial human validation and should not be trusted blindly. One such algorithm is VoICE [38], which aims to increase comparability across labs and species by unifying the analytical approach. Simply put, this algorithm scores acoustic similarity of vocal output and categorizes it into a hierarchical cluster tree. A different approach to classifying vocal output was described by Valente and colleagues [39], where the individual output of a juvenile zebra finch was segmented and then analysed by calculating the Euclidean distance to segments of the Table 1. List of commonly measured acoustic parameters, which are often reported very selectively in studies investigating VPL and/or characterizing species vocal repertoires. A comparative approach to the assessment of VPL would be greatly facilitated by the comprehensive reporting of as many of these parameters as possible throughout VPL studies. Exemplary references are given for each parameter. Note that not all parameters are equally well suited to characterize tonal and non-tonal vocalizations. We use the term 'element' here to indicate diverse kinds of vocalizations, such as calls, pulses, clicks and buzzes. spectral parameters amplitude characteristics fundamental frequency ( f 0 ) [3,10,14,24,26,29] minimum, maximum level [26] minimum, maximum f 0 [2,14,26,29] envelope peaks [11] start, end frequency of f 0 [24,26] envelope skewness and kurtosis [24] dominant harmonics [10] envelope entropy [10,24] peak frequency [3,14,26] spectral centroid [10,14,24,26] temporal parameters bandwidth (var. measures) [2,11,14,24,26,29] element duration [10,11,14,24,26,29] minimum, maximum frequency [26,29] (inter-) element interval [11,26,29] frequency modulation [10,24,26,29,30] element rate (vocal activity) [11,26] formant frequencies [2,3,10,24] rhythm [31] spectral envelope skewness and kurtosis [24] time to maximum amplitude [29] (spectral envelope) entropy/aperiodicity [10,14,24,26,30] sequential characteristics order of elements [26,29,30] order of sequences [30] royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 376: 20200237 tutor song. A match occurred when a similarity threshold was crossed and the results from this automated process matched the results from human pattern recognition analysis. Such similarity recognition algorithms are often originally applied in human speech recognition (e.g. Gaussian Mixture Models [40]), but based on the spectral content of vocalizations represented by Mel-frequency cepstral coefficients. A similar approach has also been successfully used to quantify variation in the structure of acoustic signals in bats [41,42] and other mammals (e.g. deer [43], odontocetes [44] and elephants [45]). All programs and analysis algorithms have their benefits and downfalls. It is important to thoroughly research the best method to analyse the data. For this, we need to keep in mind that the judgement of the suitability of a method can always be influenced by the past experiences of the judge. Just as the VPL signal receiver might have learned preferences for a specific type of signal, researchers might have a learned preference for analysis software. Furthermore, some approaches may be better for assessing sequential correlations, whereas others are better at spectral comparisons [21]. Cross comparison not just between species, but across platforms can identify robust acoustic signals, as well as nuances that may be specific to the analysis method. A re-evaluation of the used analysis method should be conducted with the aim of evaluating whether the used software is up-to-date and the most useful tool for the present study design.

(c) Proposed extended reporting of training conditions and curves
Aside from the spectro-temporal parameters measured in different experimental conditions and the employed analysis software or strategy, several other factors require consideration and should be reported in order to facilitate VPL comparisons and the relative positioning of species on any proposed VPL continuum. For example, the time needed to achieve a certain degree of similarity should be considered. If limited training time for a few days results in the same degree of imitation of a specific template that is reached by a different species only through constant training for weeks, this should also be reported. The same is true for the conditions of the animal housing outside of the experiment. Isolated animals might have a higher motivation to participate in 'enriching' experiments, while animals that are kept with conspecifics might take a longer time to internalize a task or change. The comprehensive and comparative assessment of a species' VPL capacity expands even further and includes the number of trained individuals. Non-reported preliminary studies selecting for good learners blur the actual evaluation of the number of individuals willing and/or able to learn the task.
Reporting the overall number of trained individuals does not indicate the species capacity for VPL, but would help to assess the species' overall willingness to learn the VPL task. This could help to select suitable model species and to make decisions about required sample sizes.

The behavioural decision-making space (a) Behavioural evaluation of vocal production learning
When assessing VPL, bioacoustics measurements often give a result in the form of a test of statistical significance. Such a test might indicate that there is a significant difference between initial and trained vocalizations, and yet the meaning or behavioural relevance of the vocalization might still be the same for the species. Conversely, a statistical test might indicate that a small difference in acoustic measurements is not significant, and yet this small difference has a marked effect on the communicative function of the vocalizations. The degree of change in vocalizations, which we observe with quantitative measurement methods, is thus not necessarily a good approximation of the biological relevance of a change in the signal. For example, Nowicki and colleagues showed that female song sparrows responded significantly more to songs that had been learned slightly better, demonstrating that variation in learning abilities plays a significant functional role in sexual selection [46]. In order to assess the biological importance of a learned vocal modification, external validation is helpful and often necessary. This external validation can be done using, for example, behavioural assays or neuronal representation of change and it presents a functional assessment of VPL. There are several approaches that can be taken if an external validation of VPL is desired. These different approaches depend entirely on the aim of the study. In the most common case, the quality of imitation can be demonstrated by reaching a level of difference too low to allow discrimination by the receiver. This means that vocal imitation can be demonstrated by a behavioural test for acoustic indistinguishability. Another way to demonstrate VPL would be to imitate a heterospecific vocalization well enough to convey a meaningful message. This is the case in the study mentioned above, in which humans discriminate words vocalized by an elephant [2]. The simplest idea would be to train animals to discriminate between novel and known vocalizations. This has been done among others in zebra finches [47], in starlings [48] and in swamp sparrows [49], which discriminated between different conspecific syllables or songs.
Comparable experiments with wild animals are also conceivable. For example, if an animal were trained to learn to imitate the dialect of a foreign population, the success of this VPL experiment could be quantified by, for example, a phonotaxis-based behavioural experiment. Similar approaches have been used in discrimination experiments in the past. Playback experiments with greater spear-nosed bats indicated that these bats could discriminate between individuals from different caves as well as between individuals from their own social group and foreign groups based solely on call structure [50]. Using a similar experimental design, Knörnschild and colleagues [51] were able to demonstrate that playbacks of local territorial song of male sac-winged bats attracted females more strongly than foreign territorial songs. Both examples show that the assessment of vocal divergence is possible and feasible in these species. Other acoustic discrimination experiments are also conceivable for the behavioural assessment of VPL: a study in nightingales showed that males increased their sound level significantly when presented with playbacks of conspecific rivals, however showed only little changes to their sound level output when the playback was of a heterospecific bird [52]. Another study showing that the meaning or information of a vocalization can be extracted, even if emitted by a heterospecific, was done in marmots and ground squirrels [53], thus demonstrating that the behavioural relevance of a vocalization can be maintained even if the acoustic parameters vary significantly between emitters. The opposite was shown for isolated songs of song royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 376: 20200237 sparrows, which resemble natural conspecific song in several aspects, but do not generate the same response in conspecifics [54]. The aforementioned experiments relied on reactions of freely behaving wild animals, which is only feasible in studies were observation is possible, a luxury often not available when working with, for example, marine or solitary animals. Therefore, as mentioned above, direct human validation of the emitted vocalizations is often still required.
Another possibility to evaluate the degree of vocal imitation and to report it in a comparable way is to assess the observed vocal change in relation to the perception of the signal (e.g. auditory filters of the receiver of the newly learned signal). There is no biological need to imitate a signal perfectly if the sensory system of the receiver is not capable of discerning the minute differences between the target and imitated signal. For validation of the discriminability of newly learned vocalization and existing ones, models of the auditory periphery for humans and animals [55][56][57] should be used. These models create a spectro-temporal representation of sounds as a function of tonotopic frequency and time, the so-called auditory spectrogram [55]. This approach is typically used to describe physiological mechanisms underlying the perception of certain acoustic parameters and to explain human and animal performance in psychophysical discrimination tasks. Models of the auditory periphery typically employ implementations of the middle ear and cochlear functionalities (e.g. middle ear transfer characteristics, frequency-to-place conversion, nonlinear transformations of the organ of Corti, temporal integration) combined with an implemented decision device acting as an optimal detector based on the principles of signal detection theory [58,59]. For example, these models have been successfully used to describe behavioural discrimination performance in bats [57,60,61] and could well be used to evaluate VPL from a receivers' perspective. However, it should be kept in mind that such models are often based on approximations as experimentally derived information about model parameters is not available for many animal species. An open-source toolbox including numerous models for different stages of the (human) auditory system is available (http://amtoolbox.sourceforge.net), which might serve as a basis for applications in other species, too.
The judgement of imitation quality by a conspecific, heterospecific or perceptual model is an important criterion to assess the biological function of VPL and is critical for the investigation of the evolutionary origins of this trait. Therefore, a behavioural approach to quantify the extent and function of VPL in a species should always be considered as an important supporting experiment.

(b) Neuronal validation of vocal production learning via the receiver
The comparative approach will allow researchers to draw conclusions on the relative VPL capacity of species and thus help to uncover the biological basis of this trait [19]. A nuanced and comprehensive comparison of the VPL capacity of a preferably large number of species will ultimately allow insights into the neuronal basis of the human capacity for speech. VPL, just as speech, requires a high amount of auditory plasticity as it involves the use of auditory feedback to coordinate audio-vocal interactions while learning new vocalizations or while maintaining the stability of existing vocalizations [62]. Therefore, neural responses in the auditory pathway should become selective to the new vocalizations to be learned, leading to an enhanced representation of new vocal elements [63]. Although we are aware that recording neural responses from auditory brain regions cannot be a standard procedure in every animal model used to study VPL, we want to briefly highlight the importance of the comparative study of the neuronal basis of VPL. While the changes occurring in the neural network involved in vocal imitation and production have been studied in detail (for example, reviewed in [64,65]), the role of auditory forebrain areas in providing sensory feedback in VPL has not been studied in detail, but is receiving more and more attention [66]. In songbirds, neurons in the auditory forebrain were shown to encode information about the category of a vocalization but also about the identity of the emitter [67,68]. We here focus on the neural representation of newly learned vocalizations in forebrain areas involved in processing auditory input.
The first evidence for an emerging neuronal response selectivity for learned conspecific vocalizations in areas outside the song control system (a network of brain nuclei specialized for singing and song learning) came from the aforementioned study on adult starlings, which were trained to discriminate between conspecific songs [48]. Extracellular recordings (under anaesthesia) from neurons in the nonprimary auditory forebrain region revealed a population of neurons showing a stronger response to familiar songs used in the training sessions when compared to novel songs. Thus, experience-driven plasticity seems to modify neural responses (and therefore the representation of conspecific vocalization) on the basis of the functional demands of song recognition. While these early results came from adult birds, a recent study showed specific changes in the neuronal representation of songs in juveniles being raised with heterospecific tutors [69]. They demonstrated that tuning for conspecific songs arises in the primary auditory cortical circuit of finches, as neurons showed stronger responses to conspecific songs than to songs of other species. Furthermore, this cortical representation could be shifted towards the songs of a tutor species in cross-fostering experiments. It was shown that the spectro-temporal tuning properties of neurons were altered to fit the spectro-temporal modulations of a learned song [69]. These findings support the indication that experience-dependent mechanisms might promote the alignment of auditory responses with the output of newly learned vocal motor-behaviour. Results from other studies on starlings hint in the same direction: male starlings raised without direct contact to adults not only failed to develop typical song classes but neurons in the caudomedial nidopallium (NCM, an analogue to mammalian secondary auditory cortex [70]) also failed to develop differential responses to different functional classes of song [71]. By contrast, differential NCM responses have been demonstrated in wild-caught starlings [72].
In mammals, evidence for changes in the sensory representation of species-specific communication due to VPL is still lacking. However, call-selective cortical neurons have been described in non-human primates [73,74]. Therefore, it can be assumed that similar changes as in songbirds can also be expected in auditory forebrain areas in mammalian species capable of VPL. However, it is important to note that changes in sensory representation in the auditory forebrain of birds and mammals can also occur independent of VPL, e.g. as a result of experience-dependent plasticity [75][76][77][78]. Where applicable, the neuronal validation of VPL through plastic changes in the royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 376: 20200237 sensory representation of species-specific vocalization might, therefore, be an interesting additional tool to comprehensively investigate the capacity of VPL in a species. In addition to the investigation of neural activity and responses by the means of electrophysiological recordings, genetic methods can be used for the evaluation of acoustic signals. Specifically, immediate early gene expression has been used to identify active brain regions, e.g. during singing, song learning [79,80], and the perception of categorical different acoustic stimuli [81].

Conclusion and outlook
The aim of the guidelines provided here is to achieve a wide-reaching comparability in future reporting of findings concerning the VPL capacity of species. In order to achieve this, we highlight the importance of five factors influencing the unification of the VPL assessment: (i) comparison of vocal change to a well-established baseline, (ii) comprehensive reporting of acoustic parameters (not only significant ones), (iii) extended reporting of training conditions and durations, (iv) investigating VPL function via behavioural, perception-based experiments and (v) validation of findings on a neuronal level via the receiver. While the VPL capacity of a species can be successfully demonstrated without the inclusion of these factors, the comparison of cross-species VPL capacities is vitally dependent on our joint efforts to comprehensively study and report these factors.
A research culture in which a wide range of different acoustic parameters are routinely reported would allow us to draw conclusions about the VPL capacity of species independent of the current definition of vocal learning. Specifically, when comprehensive reporting of acoustic parameters is achieved, VPL capacity can easily be reassessed in cases of terminological or functional redefinition, e.g. due to the discovery of new mechanisms or forms of VPL. We want to commend authors already following these reporting guidelines, and this paper should serve simply as a gentle reminder. However, the literature shows that this is not the case for the majority of reported studies, and we hope this paper can be used as a guideline for both study design and reporting of findings, therefore promoting future comparability between studies. In the future, approaches for the less human-centric evaluation of VPL will likely become more readily available. However, until we reach this golden age of easy, species-specific, perception-based evaluation algorithms, an important improvement of the current scientific practice would be attempts to evaluate VPL-related findings through the measurement of behavioural responses. In case such experiments are not feasible, the generation of well-designed potential follow-up experiments that would demonstrate the behavioural importance of findings would be beneficial to the field and increase future comparability between studies evaluating the VPL capacities of species.
Data accessibility. This article has no additional data. Authors' contributions. All authors participated in the 'Unifying Vocal Learning' workshop in Leiden, 2019, laying the conceptional groundwork for the manuscript. E.Z.L. finalized the concept and wrote the outline of the manuscript. E.Z.L., U.F. and S.G.H. wrote large sections of the paper. All authors have worked on finalizing the paper and critically revised the manuscript.